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Abstract 

Missing data estimation is an important challenge with high-dimensional 
data arranged in the form of an array.In this paper we propose a proba- 
bility model for partially observed multi-way array data. Fisher scoring 
and expectation maximization are used for estimation of the parameters 
of this distribution. The main application is to missing data imputation 
for multi way data. 



1 Introduction 

A vector is a one way array, a matrix is a two way array, by stacking matri- 
ces we obtain three way arrays, etc, ... Array variate random variable s up to 
two dimensions has been studied intensively in iGupta and Nagarl 2000| | and by 



many others. For arrays observations of 3, 4 or in general i dimensions proba 



Srivastava et al.l |2008aj and lOhlson et al.1 [2011] ) . 



pr 

bility models have been pro posed very recently in ([Akdemir and Guptal [201 1| , 



Incomplete data are a major concern for the analysis of array variate random 
variables. The purpose of this article is to develop likelihood based methods for 
estimation and inference for a class of array random variables when we only 
have partially observed arrays. 

In Section 2, we introduce a normal model for array variables. In Section 
3, we introduce the Full EM and the Hybrid FS-EM algorithms for parameter 
estimation and missing data imputation. Two examples illustrating the use of 
these algorithms are in Section 4. 



2 Array Normal Random Variable 

The family of normal densities with Kronecker delta covariance structure are 
given by 

<KX; M,Ai,A 2 ,...Ai) = Z^X iA ' 1)2 ' ' rf^ " ^" 2) « 
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where Ai, A 2 , ■ ■ ■ , Ai are nonsingular m atrices of orders mi, m 2 , . . . , rm; the 
R-Matrix multiplication (jRauhalal J2002J) which generalizes the matrix multi- 



plication (array multiplication in two dimensions) to the case of fc-dimensional 
arrays is defined element wise as 



((A 1 ) 1 (A 2 ) 2 ...(A i y 

m\ m2 m.3 mj 

y j (^l)giri y j ( J ^2)g 2 r 2 (• / 43)g 3 r 3 ■ ■ ■ { J ^i)qir i {X)nr 2 ...r 



r\— 1 T2 — 1 T3 — 1 Ti= 

and the square norm of X miX m 2 x...mi is defined as 

mi rri2 rrij 

n*n 2 = £ £■■■£((* w.*) 

3i=l Ja=l 3<=1 
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Not e that R-Mat rix multiplication is sometimes referred to as the Tucker prod- 
uct (|Koldal [2006^ . 

The main advantage in choosing a Kronecker structure is the decrease in 
the number of parameters. The estimation and inference for the parameters 
of the array normal distribution with Kronecker delta covariance structure, 
based on a random sample of fully observed array s \X\, X 2l ■ ■ ■ , Xn}, can 
been accomplished by max i mum likelihood estimatio n dSrivastava et al l |2008bj . 
lAkdemir and Guptal l201ll.lSrivastava et all [2008aj and lOhlson et all [201 lj ) or 



by Bayesian estimation (|Hon1 [2011 1) 



The operator rvec describes the relationship between X miXm2X _ /rrii and its 
monolinear form x mim2 ... miX i- rvec(X miXm2X ... mi ) = x mim2 „. miX i where x is 
the column vector obtained by stacking the elements of the array X in the order 
of its dimensions; i.e., (X)j 1 j 2 ...j i = (x)j where j = (ji — \)mi-\mi^ 2 ■ ■ .roi + 
(ji - 2)TO i _2"i i _ 3 . . . mi + . . . + (j 2 - l)mi + ji. 

The following are very useful properties of the array normal variable with 
Kronecker Delta covariance structure. 

Property 2.1 If X ~ <j,(X; M, Ai, A 2 , ■■■ Ai) then rvec(X) ~ cj)(rvec(X); 
rvec(M), Ai <g> . . . ® A 2 ®Ai). 

Property 2.2 If X ~ (f>(X;M,Ai,A 2 ,...Ai) then E(rvec(X)) = rvec(M) 
and cov(rvec(X)) = (Ai <S> . . . <g> A 2 ® Ai)(Ai ® . . . ® A 2 <g> Ax)'. 

In the remaining of this paper we will assume that the matrices Ai are 
square root of the positive definite matrices S; for i = 1,2, ... ,i and we will 
put A = Si ... <g> S 2 ® Si • 

3 Updating Equations for the Parameters 

Using linear predictors for the purpose of i mputing missing values in multivariate 



normal data dates back at least as far as ([Anderson! [1957[). The EM algorithm 
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( Dempster et al. 1977| ) is usually utilized for multiva riate normal distribution 

with missin g data. The EM met h od goes back to (lOrchard and Woodburvl 

19721 ) and ([Beale and Little! [l975| ) . iTrawinski and Bargmanrl 1964| and Hartley and Hockine 
1971] developed the Fisher scoring algorithm for incomplete multivariate nor- 
mal data. 

Let x be a k dimensional observation vector which is partitioned as 



R 






M 


x = 


•Em 



where x r and x m represent the vector of observed values and the missing ob- 
servations correspondingly. Here 

R 

M 

is an orthogonal permutation matrix of zeros and ones and 



R 


f 




M 







The covariance of 



R 

M 



is given by 
cov(x) 



R 
M 



3.1 Fisher Scoring Algorithm 
3.1.1 Score Function for M. 

Let X\, X2, . . . , Xn be a random sample of array observations from the distri- 
bution with density 4>(X; M, A\, A2, 4i). When the covariance parameters 

are known the score function for M. is readily available by using 
the array-monolinear form relationship in Property 12.11 and the corresponding 
theory for the multivariate normal variable with missing observations. 
Let X\ = rvec(Xi) and 



for I = 1,2, 



" Ri ' 






Mi _ 


Xl = 





. , N. The score function for Ai is given by 

N 

\£f(M) = ^R' l {RiKR' l )- 1 (x rl - Rirvec(M)). 



1=1 



The estimating equation ^f(Ai) — gives the explicit solution 
rvec(M) = J -1 > ' i^Ai^" 1 - 



JV 

1=1 



(2) 
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where J is the information matrix for rvec(M) and is given by 



N 



j = J2r[(r 1 ar' 1 )- 1 r 1 . 



The asymptotic covariance for rvec(At) is therefore J . 
3.1.2 Score Function for Ak 

Let Xi, X 2 , ■ ■ ■ , Xn be a, random sample of array observations from the distri- 
bution with density 4>(X; M., Ai,A 2 , - ■ ■ A4). 

Assume M and all of A\, A2, ■ ■ ■ , A% are known except for Ak- In this case, 
the variable 

Z = (.Ai" 1 ) W) 2 • • • {A k - l )- 1 ) k -\L mk )\A- k l l ) k+1 . . . (Ar'nx M) 

has density <j)(Z; 0, I mi , I m2 , . . . I mk _ 1 ,A k ,I mk _ 1 I m ,)- 

Now, let Z( fe ) denote the m k x Y\j^ mj matrix obtained by stacking the 

elements of Z along the fcth dimension. Hence, we can write Z( k ) ~ <P{Z{k)', 
°"ik xUj^k m^Ak, lu&k m 3 )' tnerefore tne corresponding random sample (Z( k )i, 
Z( fe )2, • • • , ^(fe)jv) = (zi, «2, ••• z^n^™,) provides a random sample of size 
TV II j^fe m j f rom the mfc-variate normal distribution with mean zero and covari- 
ance £/c = AkA' k . 

Let C7fc im denote the Zmth element of for 1 < Z < m < rrik- The corre- 
sponding elements of the score function for under multivariate normality are 
given by () 

*(S fe ) ;m = £ tr{M/ fc!m? (2 9 4-S ferr J} 

where 

The sensitivity matrix Sk for S^, dchned as the expected derivative of the 
estimating function \E , (S / t); m with respect to the entries X, has elements given 

by 

N Wj^k rrtj 

and dimension (mfe(m/s + l)/2) 2 . The Newton scoring algorithm for is hence 
given by means of the update 

ni +i =n-s(nr^(n) (3) 

where the result of the matrix product S'(Sfe) _lv I'(£fe) is understood as a mf, 
symmetric matrix with lower triangle defined by symmetry. 
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3.2 The EM Algorithm 



Let Xi,X2, ■ ■ ■ be a random sample of array observations from the dis- 
tribution with density <j>{X; Ai, Ai, A2, 4j). Let the current values of the 

parameters be Ai*, A\,A\,...A\. 

3.2.1 The updating equation for Ai 

The updating equation of the parameter Ai is given by 

1 N o 
rvec(M t+1 ) = — ^Vuec(X;) 

2=1 

N 

= rvecM* + ^ k l R'^Rik 1 R'^ 1 {x rl - R^ve^M*)) (4) 
1=1 



3.2.2 The updating equation for Sfe 

Let 

z = {A\- l Y{Al~ 1 ) 2 ■ ■ ■ (A-i^-HimjHK+i' 1 )^ 1 ■ ■ ■ (4~ x Y(x - m- 

Let denote the x Yij=£k m j ma trix obtained by stacking the elements of 
Z along the fcth dimension with the qth column represented by z q . The updating 
equation for Sfc is given by 

*i +1 = NIl — E [M;+^(=L^-=L, E *ii E L..) M J- ( 5 ) 



Jjtk q=l 



4 Flip-Flop Algorithm for Incomplete Arrays 



Inference about the parameters of the model in (T O) for the matrix variat e 



case has been consid e red in the statistical litera t ure ( Rov and Khattreel 2003j , 
Roy and Leiva 2008 1. Lu and Zimmerman 120051. Srivastava et a l. 2008b , etc.). 
The Flip-Flop Algorithm Srivastava et al.l [2008b| is proven to attain maximum 
likelihood estimators of the parameters o f two d imensional ar r ay va riate nor- 



mal distribution. In (|Akdemir and Guptal [201 lj , lOhlson et al.l [201 1| and iHoff 



2011] ). the flip flop algorithm was extended to general array variate case. 



For the incomplete matrix variate observations with Kronecker delta covari- 
ance structure par ameter estimation and missi ng data imputation methods have 
been developed in lAllen and Tibshiranil [2010) . 

The following is a modification of the Flip-Flop algorithm for the incomplete 
array variable observations: 

Algorithm for estimation: 

Given the current values of the parameters, repeat steps 1 and 2 until con- 
vergence: 
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1. Update M using © or ©, 

2. For fc = 1, 2, . . . ,i update S fc using © or ©. 

Note that at each step of this algorithm we can choose the EM or Fisher 
Scoring updating equations. Therefore there are four modifications possible: 

1. Full FS: Both steps of the estimation algorithm uses the Fisher scoring 
updating equations. 

2. Full EM: Both steps of the estimation algorithm uses the EM updating 
equations. 

3. Hybrid FS-EM: First step uses the Fisher scoring update and second step 
uses the EM update. 

4. Hybrid EM-FS: First step uses the EM update and second step uses the 
Fisher scoring update. 

In the following we have only implemented the Full EM and the Hybrid 
FS-EM algorithms. 



5 Illustrations 



Example 5.1 In this first example we have simulated data from a 2 x 2 array 
normal distribution with differing number of observations. For each sample size, 
we have repeated the experiment 10 times. The convergence of the estimator of A 
is checked by reporting the mean L = ||A — A|| 2 over 10 trials at each sample size. 



True covariance components were 



and 



.6 



-.6 



1 



. Sample sizes 



2 .6 
.6 3 

50,100, 200 and 500 were used. Missing data intensity defined as the proportion 
of the number of randomly selected (with replacement) data points that were set 
to missing to the total number of data points, in the experiments this was set 
to j. Figure [7] display the results from the Hybrid and EM algorithms. As the 
number of observations increase, L decreases towards zero. 

Examp le 5.2 In this example, we use a subset of the data previously analyzed 
bu \Basford et al. fl99A l. The most co mprehensive analyses of the se data as well 
as experimental details can be found in Wasford and TukeA \199&} . The data set 
involved measurements on 58 different soybean lines observed on 6 traits and in 8 
environments. Because of the low number of replications, we have only included 
the first 20 lines in our analysis. We have assumed that the 20 x 6 matrices of 
observations from different environments (4 locations, 2 times) were independent 
and identically generated from a two way array (matrix) normal distribution. 
We have deleted all the observations for the lines 1 through 10 for the last 
environment and estimated these using the Full EM algorithm. The average 
correlation between the true and the estimated values over the 6 variables was 
0.57. We have also used applied imputation using 2-nearest neighbors regression 
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Figure 1: The convergence of the Full EM (Left) and the Hybrid FS-EM (Right) 
algorithms. As the number of observations increase, L decreases towards zero. 
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tHastie et al. \2001 ]) and random forest regression iStekhoven and Buhlmanr\ 
\2012\ l) using the 120 x 8 data matrix representing the 120 variable-location 
pairs and 8 replications. The corresponding correlation values were 0.55 and 
0.57. 



6 Conclusions 

We have formulated a parametric model for array variate data and developed 
suitable estimation methods for the parameters of this distribution with possi- 
bly incomplete observations. The main application of this paper has been to 
multi-way regression (missing data imputation), once the model parameters are 
given we are able to estimate the unobserved components of any array from the 
observed parts of the array. We have assumed no structure on the missingness 
pattern other than assuming that it is fixed. 

The methods developed here use the assumption that the data is generated 
from a distribution with Kronecker delta covariance structure. The suitability 
of this model to any data set is questionable. The choice of model and and 
determination of its order could be accomplished using a model selection criteria 
based on the likelihood function which is available through the results in this 
paper. 

References 

D. Akdemir and A. K. Gupta. Array variate random variables with multiway 
kronecker delta covariance matrix structure. Journal of Algebraic Statistics, 
2(1):98-113, 2011. 

G.I. Allen and R. Tibshirani. Transposable regularized covariance models with 
an application to missing data imputation. The Annals of Applied Statistics, 
4(2):764-790, 2010. 

T.W. Anderson. Maximum likelihood estimates for a multivariate normal dis- 
tribution when some observations are missing. Journal of the american Sta- 
tistical Association, 52(278):200-203, 1957. 

K.E. Basford and J.W. Tukey. Graphical analysis of multiresponse data: Illus- 
trated with a plant breeding trial. CRC Press, 1999. 

KE Basford, PM Kroonenberg, and IH DeLacy. Three-way methods for multi- 
attribute genotype u environment data: an illustrated partial survey. Field 
Crops Research, 27(1-2):131-157, 1991. 

E. M.L. Beale and R.J. A. Little. Missing values in multivariate analysis. Journal 
of the Royal Statistical Society. Series B (Methodological) , pages 129-145, 
1975. 



8 



A. P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from in- 
complete data via the em algorithm. Journal of the Royal Statistical Society. 
Series B (Methodological), pages 1-38, 1977. 

A.K. Gupta and D.K. Nagar. Matrix Variate Distributions. Chapman and 
Hall/CRC Monographs and Surveys in Pure and Applied Mathematics. Chap- 
man and Hall, 2000. 

HO Hartley and RR Hocking. The analysis of incomplete data. Biometrics, 
pages 783-823, 1971. 

T. Hastie, R. Tibshirani, B. Narasimhan, G. Chu, T. Hastie, R. Tibshirani, 
B. Narasimhan, and G. Chu. impute: Imputation for microarray data. Bioin- 
formatics, 17(6):520-525, 2001. 

P.D. Hoff. Hierarchical multilinear models for multiway data. Computational 
Statistics & Data Analysis, 55(l):530-543, 2011. 

T.G. Kolda. Multilinear operators for higher-order decompositions. United 
States. Department of Energy, 2006. 

N. Lu and D.L. Zimmerman. The Likelihood Ratio Test for a Separable Covari- 
ance Matrix. Statistics & Probability Letters, 73(4):449-457, 2005. 

M. Ohlson, M. Rauf Ahmad, and D. von Rosen. The multilinear normal dis- 
tribution: Introduction and some basic properties. Journal of Multivariate 
Analysis, 2011. 

T. Orchard and M.A. Woodbury. A missing information principle: theory and 
applications. In Proceedings of the 6th Berkeley Symposium on Mathematical 
Statistics and Probability, volume 1, pages 697-715, 1972. 

U.A. Rauhala. Array Algebra Expansion of Matrix and Tensor Calculus: Part 
1. SIAM Journal on Matrix Analysis and Applications, 24:490, 2002. 

A. Roy and R. Khattree. Tests for Mean and Covariance Structures Relevant 
in Repeated Measures Based Discriminant Analysis. Journal of Applied Sta- 
tistical Science, 12(2):91-104, 2003. 

A. Roy and R. Leiva. Likelihood Ratio Tests for Triply Multivariate Data 
with Structured Correlation on Spatial Repeated Measurements. Statistics & 
Probability Letters, 78(13):1971-1980, 2008. 

MS Srivastava, T. Nahtman, and D. von Rosen. Estimation in General Mul- 
tivariate Linear Models with Kronccker Product Covariance Structure. Re- 
search Report Centre of Biostochastics, Swedish University of Agriculture sci- 
ence. Report, 1, 2008a. 

M.S. Srivastava, T. von Rosen, and D. Von Rosen. Models with a Kronecker 
Product Covariance Structure: Estimation and Testing. Mathematical Meth- 
ods of Statistics, 17(4):357-370, 2008b. 



9 



D.J. Stekhoven and P. Biihlmann. Missforest nonparametric missing value im- 
putation for mixed type data. Bioinformatics, 28(1):112— 118, 2012. 

I.M. Trawinski and RE Bargmann. Maximum likelihood estimation with in- 
complete multivariate data. The Annals of Mathematical Statistics, 35(2): 
647-657, 1964. 



10 



